The main goal of this course project is building simple word prediction application for saving some user’s typing time. This study is based on SwiftKey datasets. Because of computational limitations we used only random sample of near 20% of this data, binded together.
App uses two simple algorithms - SBO and correlation. Stupid back-off algorithm (SBO) predicts words, based on frequencies of n-grams. SBO is doing well for frequently used, but non-meaningful words, such as articles, prepositions and pronouns.
Second part of prediction algorithm uses only “meaningful” words to predict next “meaningful” word, based on word correlation in single text block of few sentences. This part of algorithm is our attempt to improve SBO algorithm, using all the text, not just the last n-gram. Correlation algorithm has much less accuracy, than SBO, because the most common words are excluded. Predictions are associated with previews text, so this algorithm not just predicts, but also helps user to find necessary words.
SBO is main, and Correlation is auxiliary word prediction method. Words from this two predictions could be just bound into single list, or used as starting point for more complicated grammar-wise prediction algorithm, which is out of this study.
Code for this project: https://github.com/OleksandrMyronov/Capstone_Project